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1  Introduction 


Much  of  operations  research  and  management  science  practice  centers  on  utilizing  available  data  to 
gain  insight  about  a  phenomenon,  predict  future  performance,  and  instantiate  optimization  and  simu¬ 
lation  models.  The  process  intersects  several  disciplines.  Computer  scientists  refer  to  it  as  “learning.” 
Statisticians  talk  about  “data  analysis,”  “regression,”  and  “inference.”  At  a  fundamental  level,  the 
process  relates  to  function  approximation  theory  of  mathematics,  which  has  a  long  and  prestigious 
history.  Initially,  polynomial-type  approximations  and  generalized  Fourier  series  were  the  mainstay  of 
approximation  theory  both  at  the  theoretical  and  computational  levels.  With  the  advent  of  digital 
computers,  piecewise  polynomials  in  the  form  of  splines  became  computationally  tractable  and  offered 
the  possibility  of  much  more  accurate  and  stable  approximations. 

Although  observed  data  about  a  phenomenon  receives  the  primal  attention,  at  least  in  theoretical 
studies,  additional  information  derives  from  a  much  wider  range  of  sources.  Data  is  never  obtained  in 
a  vacuum,  but  rather  in  a  context  that  provides  external  information  in  a  variety  of  ways.  Physical 
laws  about  the  phenomenon  may  dictate  that  a  regression  line  is  increasing.  Asymptotic  theory  could 
indicate  that  a  probability  density  function  is  “nearly”  normal.  An  analyst’s  experience  gives  a  host  of 
“soft”  information,  maybe  about  ranges  of  possible  values,  signs  of  correlations,  and  number  of  modes. 
It  is  essential,  especially  when  the  data  is  limited,  corrupted,  and  otherwise  deficient,  to  utilize  all 
available  information,  both  from  data  and  external  sources.  This  is  intuitively  understood  and  widely 
exploited  in  practice.  However,  the  theoretical  foundation  and  computational  tools  for  carrying  out  this 
process  are  incomplete.  In  this  tutorial,  we  describe  recent  efforts  to  remedy  the  situation  through  the 
formulation  of  optimization  problems  and  the  examination  of  such  problems  via  variational  analysis. 
Specifically,  we  address  the  problem  of  how  to 

identify  a  function  that  best  represents  data  and  also  satisfies  information- driven  constraints. 

The  problem  arises  broadly  in  curve  fitting,  interpolation,  regression,  density  estimation,  and  numerous 
other  contexts.  In  §2,  we  describe  applications  in  the  financial  and  commodity  markets,  and  list  several 
others.  Numerical  examples  are  given  in  §6.  The  information-driven  constraints  are  formulated  based 
on  external  information  as  well  as  the  data,  and  come  in  a  multitude  of  forms  as  illustrated  below.  The 
“best”  is  quantified  by  a  criterion  that  could  be  least-squares,  maximum  likelihood,  minimum  error 
measure,  etc.  We  refer  to  this  problem  as  the  function  identification  problem  (FIP). 

The  generality  of  FIP  and  its  central  position  in  a  wide  array  of  quantitative  fields  give  rise,  natu¬ 
rally,  to  a  truly  immense  literature  on  the  subject.  We  make  no  attempt  to  present  a  comprehensive 
treatment,  but  instead  offer  a  biased  view  grown  out  of  our  gradual  realization  of  the  central  role  of 
variational  analysis  in  problems  of  this  kind.  We  formulate  FIP  as  a  constrained  infinite-dimensional 
optimization  problem  that  represents  an  actual  problem.  Incomplete  information  and  simplifications 
introduced  for  computational  reasons  lead  to  approximate  optimization  problems.  These  problems  are 
subsequently  solved  by  standard,  or  possibly  specialized,  optimization  algorithms  to  obtain  best  fits, 
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estimates,  and  approximations.  We  then  rely  on  variational  analysis  to  examine  whether  the  solutions 
of  the  approximate  problems  are  indeed  approximations  of  solutions  of  the  actual  problem,  study  as¬ 
sociated  convergence  rates,  and  obtain  other  insights.  We  refer  to  solutions  of  an  actual  problem  as 
actual  functions. 

Since  determining  an  arbitrary  function  requires  pinning  down  an  infinite  number  of  parameters,  an  un¬ 
avoidable  simplification  comes  from  replacing  such  functions  by  simpler  ones  that  are  fully  characterized 
by  a  finite  number  of  parameters.  The  consideration  of  simple,  but  hopefully  representative,  functions 
is  of  course  common  in  linear  regression,  parametric  density  estimation,  interpolation,  and  many  other 
areas.  Epi-splines  are  newly  developed  piecewise  polynomial  functions,  described  by  a  finite  number 
of  parameters,  that  are  exceptionally  flexible.  In  fact,  epi-splines  approximate  essentially  any  function 
one  can  reasonably  expect  to  encounter  in  practice,  including  those  with  discontinuities,  unbounded 
derivatives,  and  even  the  function  values  — oo  and  oo.  We  provide  an  introduction  to  epi-splines  and 
their  many  applications;  see  [22,  21,  20]  and  reference  therein  for  further  details. 

The  remaining  of  the  overview  is  organized  as  follows.  §2  provides  a  glimpse  of  the  many  applications 
served  by  epi-spline  technology.  The  mathematical  formulation  of  FIP  and  definition  of  epi-splines  are 
given  in  §3.  The  examination  of  FIP  relies  on  variational  analysis,  with  pertinent  facts  reviewed  in  §4. 
§5  summarizes  theoretical  results  that  justify  the  application  of  epi-spline  technology.  The  tutorial  ends 
in  §6  with  numerical  examples. 

2  Applications 

Despite  their  recent  discovery,  epi-splines  are  applied  in  numerous  areas.  The  first  explicit  use  of  epi- 
splines,  under  the  name  epi-curves,  was  in  the  context  of  deriving  the  zero-curves  associated  with  a 
family  of  financial  instruments  [28,  27];  EpiRisk  Research  used  it  to  build  interest  rates  and  currency 
exchange  models  and  this  has  now  also  been  packaged  by  independent  software  providers.  §2.2  provides 
an  illustration.  In  the  energy  domain,  epi-splines  are  used  to  compute  day-ahead  forecasts  of  electricity 
(load)  demand  [13,  16].  In  fact,  there  epi-splines  are  the  main  tools  for  building  a  stochastic  process 
that  not  only  provide  a  point  forecast,  but  conditional  distributions  of  demand  for  a  24-hour  period. 
The  situation  is  similar  in  forecasting  of  commodity  prices,  where  epi-splines  result  in  remarkably  ac¬ 
curate  estimates  of  copper  prices  [29].  Epi-splines  are  well-suited  for  probability  density  estimation, 
especially  in  the  context  of  little  data  [7,  25,  22],  Uncertainty  quantification  and  simulation  input  and 
output  analysis  are  applications  that  directly  benefit  from  the  epi-spline  technology  for  density  estima¬ 
tion  [19,  24].  Variograms  are  central  tools  in  spatial  statistics  and  preliminary  tests  show  the  promise 
of  epi-splines  also  in  that  area  [21], 

Initial  efforts  on  epi-splines  centered  on  functions  defined  on  a  compact  subset  of  the  real  line;  see  [21] 
for  an  overview.  Although  this  covers  many  important  applications,  higher  dimensions  undoubtedly 
arise  too.  Initial  efforts  in  image  reconstruction  [25,  20],  density  estimation  [25,  20],  response  surface 
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Figure  1:  Curve  fitting  with  polynomial  of  degree  5. 

building  [20],  and  graph  models  show  the  possibilities  (§2.4).  In  this  section,  we  provide  a  few  examples 
that  help  motivate  the  analysis  of  FIP,  with  additional  ones  furnished  by  §6. 

2.1  Curve  Fitting 

An  elementary  example  from  curve  fitting  highlights  central  components  of  FIP.  Figure  1  illustrates 
20  data  points  and  plots  a  least-squares  approximating  5th-degree  polynomial.  The  polynomial  fit  is 
clearly  too  coarse,  with  polynomials  of  other  degrees  resulting  in  even  worse  approximations.  This 
was  early  recognized  and  was  a  major  motivation  for  the  development  of  fitting  procedures  based  on 
piecewise  polynomials,  specifically  splines  [23].  We  fit  the  data  relying  on  four  such  procedures.  Each 
one  assumes,  implicitly,  that  the  actual  function  belongs  to  a  particular  family  of  functions  and  relies 
on  a  specific  fitting  criterion.  The  first  three  assume  that  the  function  is  continuous  whereas  the  last 
one  allows  for  a  potential  discontinuity  at  x  =  0.525.  We  begin  by  relying  on  common  cubic  spline 
fitting.  The  solid  line  in  Figure  2  plots  the  cubic  interpolating  spline  with  the  usual  Lagrangian  end 
conditions  whereas  the  dashed  line  is  the  least-squares  cubic  smoothing  spline  with  smoothing  penalty 
1  •  10~4  assigned  to  the  second-order  derivative  term.  The  fits  improve  over  the  polynomial  fit,  though 
the  “smoothness”  of  the  smoothing  spline  curve  depends  on  the  choice  of  penalty. 

The  final  two  procedures  rely  on  epi-splines  built  from  second-degree  polynomials.  Suppose  that  in 
addition  to  the  data,  the  analyst  believes  that  the  actual  function  is  continuously  differentiable,  with 
second-order  derivatives  in  the  range  [—100,100].  This  external  information  forms  constraints  on  the 
allowable  functions  that  together  with  a  least-squares  criterion  give  the  dashed  line  in  Figure  3.  The 
epi-spline  line  resembles  the  smoothing  spline  of  Figure  2,  with  the  former  having  a  slightly  better  fit 
at  some  points.  The  smoothing  spline  is  a  result  of  minimizing  squared  error  plus  a  weighted  integral  of 
second-order  derivatives.  The  epi-spline  simply  minimizes  the  squared  error,  but  accounts  for  (point- 
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Figure  2:  Cubic  interpolating  and  smoothing  spline  fits, 
wise)  bounds  on  the  second-order  derivatives. 

Suppose  that  external  information  further  indicates  that  there  may  be  a  discontinuity  around  x  =  0.525. 
Then  an  epi-spline  illustrated  by  the  solid  line  in  Figure  3  yields  what  appears  to  be  the  best  fit  seen 
so  far.  Although  not  pursued  further  here,  other  epi-splines  could  be  calculated  that  satisfy  additional 
constraints  derived  from  external  information  about  concavity  and  other  jumps,  or  by  collecting  further 
data  points.  A  sequence  of  such  problems  can  be  thought  of  as  approximations  of  an  actual  FIP 
involving  an  infinite  number  of  data  points  and  general  functions.  Below  we  discuss  justifications 
for  such  approximations,  especially  those  involving  replacing  general  functions  under  consideration  by 
epi-splines. 

2.2  Financial  Curves 

It  is  standard  procedure  in  financial  engineering  to  use  a  family  of  similar  instruments  consisting  of 
Treasury  bonds,  annuities,  commercial  papers  and/or  related  assets  to  estimate  their  zero-curves  in¬ 
cluding  the  spot  rate  curve,  the  forward  rate  curve,  and  the  discount  factor  curve.  These  curves  are 
intimately  related  as  any  one  of  them  can  be  derived  from  knowing  any  one  of  the  others;  see  [28,  27] 
and  references  therein.  Here,  we  consider  the  problem  of  finding  a  discount  factor  curve  d  for  the  sake 
of  illustration. 

Suppose  that  for  each  instrument  i  =  1,2,...,/,  there  is  a  cash- flow  yielding 

cash  payments  pl0,p\, . . .  ,plN.  at  points  in  time  ,  tlN.,  respectively. 

The  first  one  of  these  “payments”  usually  corresponds  to  purchasing  price,  the  remaining  ones  to  the 
income  generated;  so,  pl0  is  usually  negative  whereas  the  remaining  ones  are  positive.  To  simplify  the 
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Figure  3:  Second-order  epi-spline  fits  with  and  without  discontinuity. 

notation,  we  let  tl0  =  0  (today)  and  T  >  t) v.  .  For  example,  T  could  related  to  the  longest  maturity  time 
of  all  the  instruments.  Since  one  can  reasonably  assume  that  the  discounted  value  of  future  payments 
balances  the  initial  expense  or,  more  generally,  the  signed,  discounted  payment  stream  equals  zero, 
identifying  a  discount  factor  curve  amounts  to  finding  a  real- valued  function  d  on  [0,  T]  with 

Ni 

d(fj)plj  =  0  for  all  i  =  1,2, ...,  I. 

3= o 

External  information  about  the  nature  of  discount  factor  curves  dictates  further  that  d( 0)  =  1  (the 
value  of  money  today  is  its  face  value),  d  >  0  (discount  factors  are  nonnegative  numbers),  and  the 
derivative  d'  <  0  (discount  factors  cannot  increase  with  time).  Additional  information  that  relates  to 
the  desirable  properties  of  the  associated  forward  rates  curves,  cf,  [27,  §1],  justifies  the  restrictions  to 
smooth  curves  and  we  therefore  also  require  that  d  is  continuously  differentiable.  Since  ensuring  that 
all  of  the  above  I  equations  are  satisfied  may  not  be  possible  due  to  inconsistent  pricing  in  the  markets, 
we  formulate  the  problem  in  terms  of  a  norm-minimizing  optimization  problem: 

such  that  d{ 0)  =  1,  d  >  0,  d'  <  0, 

where  it  is  implicitly  assumed  that  d  is  a  continuously  differentiable  function  and  ||  •  ||  is  a  well-chosen 
norm.  The  problem  is  clearly  a  FIP,  with  constraints  derived  from  insight  about  the  situation. 

2.3  Commodity  Price  Forecasts 

A  more  complicated  example  arises  in  commodity  pricing.  In  [29,  §3],  epi-spline  technology  builds  a 
stochastic  process  for  short-term  copper  prices,  say  for  the  next  6-12  months.  The  effort  relies  on  a 
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Figure  4:  Epi-spline  estimates  of  copper  prices  (a)  with  and  (b)  without  future  contract  information. 

common  model  for  the  evolution  of  copper  prices  as  well  as  closely  related  quantities  such  as  interest 
rates  and  exchange  rates  based  on  geometric  Brownian  motion.  The  drift  and  volatility  terms  in  the 
corresponding  stochastic  differential  equation,  however,  are  estimated  using  epi-splines.  The  available 
information  is  historical  prices,  see,  for  example,  the  line  in  Figure  4(a)  that  stretches  across  the  figure, 
where  “today”  is  month  0  so  that  prices  only  up  to  that  time  is  available.  Moreover,  we  also  know 
current  prices  of  future  contracts  for  copper  deliveries  as  well  as  for  various  interest  rate  and  currency 
exchange  instruments. 

Epi-spline  technology  enters  then  twice.  First,  prices  of  future  contracts  are  converted  into  (implied 
future)  spot  prices.  This  is  achieved  by  computing  a  discount  factor  curve  in  the  manner  described 
above  and  then  converting  it  into  a  spot  rate  curve  using  standard  expressions.  Second,  the  historical 
prices,  today’s  price,  and  the  spot  market  prices  just  computed  for,  say,  the  next  9  months  provide 
data  that  is  fitted  using  a  second-order  epi-spline  under  the  additional  external  information  that  it  is 
continuously  differentiable.  The  resulting  fit  provides  estimated  drift  terms  for  the  stochastic  differen¬ 
tial  equation.  The  solid  line  in  Figure  4(a)  starting  at  month  zero  shows  the  price  drift  for  the  next 
12  months.  We  note  that  in  this  approach  the  initial  conditions  of  the  process  is  not  today’s  observed 
price.  A  justification  for  proceeding  in  this  fashion  is  that  one  should  view  today’s  (observed)  spot 
price  as  the  “actual”  spot  price  perturbed  by  some  noise;  empirical  calculations  carried  out  in  [29,  §6.1] 
confirm  that  this  approach  yields  much  better  results. 

Finally,  the  volatility  parameters  of  the  process  are  obtained  by  a  least-square  fit  of  the  covariance  after 
adjusting  the  observed  historical  prices  and  the  forecasted  spot  prices  by  subtracting  their  estimated 
expectation;  for  more  about  this  approach  cf.  [29] .  Figure  4(a)  illustrates  the  volatility  with  dotted  lines 
corresponding  to  the  95%  confidence  interval  for  the  process  at  each  point  in  time.  Figure  4(b)  provides 
similar  results,  but  without  using  epi-spline  technology  to  convert  information  about  future  contracts 
into  implied  future  spot  prices.  It  is  apparent  that  including  this  additional  external  information 
dramatically  improves  the  quality  of  the  estimates. 
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2.4  Density  Estimation  on  Graphs 


A  final  illustration  is  provided  by  a  problem  in  graph  theory.  There  is  a  rapidly  growing  literature  on 
models  for  graphs  that  rely  on  exponential  families  of  probability  density  functions 

hp(G)  =  exp  /3iTi(G )  +  cp 
\t=i 

to  describe  the  likelihood  of  a  graph  G  characterized  by  its  number  of  edges  (Ti(G)),  triangles  (22(G)), 
stars  (23(G)),  cycles  (24(G)),  etc.,  with  (5  =  (/3i,/32,  ...,/3j)  being  a  vector  of  parameters  and  cp  a  nor¬ 
malization  coefficient;  see  [8]  and  references  therein.  From  observed  data  in  the  form  of  a  collection 
of  graphs,  there  are  significant  challenges  to  estimate  the  parameters  f3.  In  [8],  we  find  an  approach 
that  relies  on  graph  limits  in  the  sense  of  Lovasz  and  large-deviation  results  for  random  graphs,  and 
leads  to  an  infinite-dimensional  problem  whose  optimal  value  is  an  estimate  of  the  crucial  normalization 
constant  cp. 


Specifically,  following  [8],  we  define 


ipo(f)=  /  /  lf{xi,x2)logf(x1,x2)  +  -  f(x1,x2))log(l  -  f(xi,x2))dxidx2 

Jo  Jo 

ipi(f)  =  /  /  f(xi,x2)dxidx2 

Jo  Jo 

v>2  (/)  =  /  /  /  f(xl,x2)f(x2,x3)f(xl,x3)dxidx2dx3. 

Jo  Jo  Jo 


The  infinite-dimensional  problem  is  then  to 


minimize  i/>o(/)  ~  Piipi  (/)  —  P2i/^2  (/)  such  that  /  symmetric  and  0  <  /  <  1 

over  all  measurable  functions  on  [0,  l]2.  Here,  symmetric  means  that  f(x i,x2)  =  f(x 2,x\)  for  all 
(xi,x2)  €  [0,  l]2.  An  approach  relying  on  epi-splines  for  solving  this  problem  is  currently  under  inves¬ 
tigation.  Figure  5  provides  an  example  of  a  minimizing  function  /  for  the  choice  /3\  =  4  and  f32  =  —7. 
This  problem  only  indirectly  relates  to  “fitting  data,”  and  shows  the  breath  of  possible  applications  of 
the  epi-spline  technology. 


3  Problem  Formulations 

As  indicated  above,  FIP  is  an  optimization  problem  and  we  now  turn  to  its  formulation.  In  particular, 
we  make  distinction  between  an  actual,  conceptual  problem  and  an  approximate  problem  to  be  solved 
computationally.  The  section  ends  with  formulation  of  constraints  derived  from  external  information 
and  practical  guidelines. 
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Figure  5:  Example  of  minimizing  function  /  in  graph  density  problem. 

3.1  Actual  Problem 

We  formulate  FIP  as  the  problem  of  determining  a  function  /  defined  on  Mn  that  minimizes  some 
criterion  given  by  a  functional  tjj,  while  also  satisfying  constraints  given  by  a  feasible  set  F.  The  problem 
may  be  conceptual,  involving  information  not  available  and  requiring  operations  not  implementable 
during  its  solution.  This  “actual  problem”  therefore  takes  the  form 

Actual  Problem  (P)  :  min i/j(f)  such  that  /  G  F  C  F, 

where  F  is  a  space  of  functions.  The  criterion  functional  i/j  could  be,  for  example, 

J 

V’(Z)  =  y 3  -  fi?3))2 

3  =  1 
J 

V’(z)  =  \y3  - 

3= 1 

iff)  =  max  | y3  -  f(xJ) \ 

3  =  1,..., J 

where  {(x-7,  yi)}j=1  is  observed  data.  These  terms  express  errors  from  the  observed  data  as  measured 
by  mean  squared,  mean  absolute,  and  max  absolute  errors,  respectively.  More  generally  for  a  known 
reference  function  /°,  we  may  have 


W)  =  ll/-/° 
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where  ||  -  ||  is  a  norm  defined  on  the  space  of  functions  under  consideration.  For  random  variables  X 
and  Y,  with  X  ?r-dimensional,  expressing  possible  “future”  data,  examples  of  criteria  are 

V>(/)  =  E\f(X)] 

W)  =  ^[max{0,  Y  -  f(X)}}/(  1  -  a)  +  E[Y  -  f(X)} 
iP(f)  =  £(Y-f(X)). 

The  first  example  focuses  on  the  expected  value  of  the  random  variable  f(X)  as  arises  in  density 
estimation  using  log-likelihood  maximization  and  an  exponential  transformation;  see  [22].  The  second 
expression  states  the  Koenker-Bassett  error  used  in  quantile  regression,  with  a  G  [0, 1).  The  third 
gives  the  error  between  random  variables  Y  and  f(X )  as  expressed  by  an  error  measure,  of  which  the 
Koenker-Bassett  and  mean  squared  errors  are  special  cases;  see  [17]  for  other  examples  that  lead  to 
many  different  types  of  generalized  regression.  Two  additional  examples  are  furnished  by 

*l>(f)  =  j  \\f'(x)\\2dx 
'ip(f)  =  min  f(x), 

X 

which  do  not  directly  relate  to  the  data.  The  first  one  is  common  in  applications  such  as  interpolation. 
Here,  the  criterion  is  to  obtain  a  function  with  minimum  “variation”  as  measured  by  the  integrated 
squared- value  of  the  second-order  derivative  or,  more  generally,  a  squared  norm  of  the  Hessian  matrix. 
The  last  example  illustrates  the  versatility  of  the  framework  as  it  allows  for  criteria  functionals  that 
themselves  involve  optimization. 

The  feasible  region  F  likewise  comes  in  a  large  variety  of  forms.  The  restrictions  to  functions  that  are 
smooth,  convex,  monotone,  log-concave2  naturally  arise  in  applications.  For  example,  an  analyst  may  be 
rather  certain  that  increasing  input  results  in  increasing  output  in  a  regression  analysis.  Consequently, 
she  should  restrict  the  consideration  to  increasing  functions.  The  prevalence  of  “normal- looking”  prob¬ 
ability  density  functions  in  an  area  may  convince  an  analyst  to  only  consider  log-concave  densities, 
which  have  only  one  mode.  However,  the  possibilities  extend  much  further.  The  functions  of  interest 
in  an  application  may  be  known  to  attain  specific  values  at  a  finite  number  of  points,  or  be  zero,  — oo, 
or  oo  outside  a  certain  (compact)  set.  The  functions  could  be  nonsmooth,  but  with  subgradients  that 
are  contained  in  certain  sets  and  possess  specific  properties,  or  could  be  within  some  distance  from  a 
reference  function.  In  fact,  any  one  application  may  need  to  contend  with  many  or  all  of  this  external 
information  at  the  same  time.  At  the  end  of  this  section,  we  give  some  concrete  examples  of  how  these 
pieces  of  information  can  be  formulated  mathematically  within  an  epi-spline  framework.  We  refer,  for 
example,  to  [22]  for  illustrations  of  the  effect  of  external  information  in  improving  estimates. 

The  distinction  between  a  criterion  functional  and  constraints  defining  the  feasible  set  is  less  important 
than  it  first  appears.  It  is  well-known  in  practical  application  of  optimization  models  and  methods  that 

2  We  recall  that  a  function  /  is  log-concave  if  log  /  is  concave. 
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an  objective  function  can  be  converted  into  a  constraint  and  a  constraint  function  can  take  the  place 
as  objective  function  with  computational  and  modeling  benefits.  However,  some  requirements  such  as 
continuity,  convexity,  and  differentiability  are  most  naturally  formulated  as  constraints.  Quantifications 
of  “goodness-of-fit”  lend  themselves  as  criteria  functionals.  Typically,  application-specific  conditions 
indicate  the  most  beneficial  formulation,  with  the  opportunity  to  include  arbitrary  constraints  in  (P) 
offering  significant  flexibility. 

Two  classical  examples  provide  additional  context.  A  problem  of  smooth  interpolation  aims  to  deter¬ 
mine  a  function  /  defined  on  ]R  that 

minimizes  J  f"{xfdx  such  that  f(xj)  =  yj  for  all  j  =  1,  2...,  J, 

i.e.,  to  determine  the  “smoothest”  function  that  interpolates  the  points  (x3,y3),  j  =  1,2, ...,  J.  When 
restricted  to  functions  in  a  certain  Sobolev  space,  the  unique  solution  of  this  problem  is  a  cubic  spline3 
[15];  see  also  [11,  1]  for  numerous  extensions.  The  solid  line  in  Figure  2  illustrates  such  a  cubic  spline 
for  the  data  set  of  Subsection  2.1.  Cubic  smoothing  splines  balance  interpolation  of  the  data  with 
“smoothness”  of  the  curve  and  arise  as  optimal  solutions  of 

minify  -  fix3))2  +  A  J  f"{x)2dx, 

3= 1 

where  A  >  0  is  a  smoothing  penalty  [26,  12].  The  dashed  line  of  Figure  2  provides  an  illustration  for 
A  =  1  •  10~4.  Other  “regularizations”  by  t\-  and  ^o-norms  lead  to  similar  problems.  It  is  clear  that 
these  examples  are  special  cases  of  the  actual  problem  (P). 

In  the  interpolation  and  smoothing  spline  examples,  optimal  solutions  are  automatically  cubic  splines 
and  the  choice  of  function  space  to  consider  falls  naturally  to  a  Sobolev  space  to  ensure  existence  and 
integrability  of  second-order  derivatives.  The  general  case  (P)  requires  more  care  to  ensure  a  sufficiently 
rich  class  of  functions  that  can  capture  (essentially)  all  possibilities  arising  in  practice.  We  would  like 
to  allow  for  discontinuous  functions,  functions  with  arbitrary  rates  of  growth,  for  example  unbounded 
derivatives,  and  functions  that  assume  the  values  — oo  and  oo.  Discontinuities  certainly  arise  in  the 
nonparametric  estimation  of  densities,  in  response  surface  building,  and  in  curve  fitting.  Curve  fitting 
may  also  lead  to  functions  that  grow  too  fast  to  be  integrable  in  some  sense.  The  possibility  of  function 
values  —  oo  and  oo  may  be  needed  when  implicit  constraints  are  present,  for  example  in  curve  fitting  of 
functions  with  effective  domain  less  than  the  whole  domain;  it  may  also  arise  in  applications  involving 
rescaling,  exponential  rescaling  for  instance,  where  zero  values  need  to  be  allowed.  Consequently, 
we  let  the  space  of  function  under  consideration,  P,  be  the  set,  or  possibly  a  subset,  of  lower  semi- 
continuous  (lsc)  functions  from  Mn  to  the  extended  real  line  1R  :=  M  U  {— oo,  oo},  excluding  the  trivial 
function  /  =  oo,  which  is  identical  to  infinity  everywhere.  The  left  portion  of  Figure  6  illustrates  a 

3Cubic  splines  are  piecewise  polynomial  functions  of  third  degree  that  are  twice  continuously  differentiable. 
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Figure  6:  Lower  (left)  and  upper  (right)  semi-continuous  functions. 


lsc  function  on  ]R.  As  shown,  a  lsc  function  may  be  extended  real-valued  with  values  — oo  and  oo,  and 
have  discontinuities,  at  which  point  the  function  “jumps  down.”  The  formal  definition  of  a  lsc  function 
requires  additional  notation.  For  any  /  on  ]Rn,  we  let 


liminf  fix')  :=  lim 
x,—>x  54.0 


inf  fix') 

x'eB(x,S) 


where  ]B(x,5)  :=  {x1  €  Mn  |  ||a/  —  x\\  <  5}  is  the  ball  of  radius  5  centered  at  x.  Informally,  we  think  of 
liminf X'^xf{x')  as  the  smallest  value  of  /  “near”  x.  Then, 


a  function  /  is  lsc  at  a  point  x  if  liminf  f(xr)  >  f(x). 

x'—^x 

A  function  is  lsc  if  it  is  lsc  for  all  x  €  Mn.  We  denote  by  lsc- fens (-#?”)  this  set  of  lsc  functions,  excluding 
/  =  00.  The  lsc  functions  are  complemented  by  upper  semi-continuous  functions  that  “jumps  up”  at 
points  of  discontinuity,  as  illustrated  by  the  right  portion  of  Figure  6.  The  definition  is  similar.  In  fact, 
the  lsc  functions  yield  the  upper  semi-continuous  functions  after  a  sign  change.  A  parallel  development 
with  such  functions  is  therefore  mostly  superfluous. 


Since  external  information  may  indicate  that  the  consideration  of,  for  example,  continuous  or  contin¬ 
uously  differentiable  functions  suffices,  not  all  applications  require  the  full  generality  of  lsc-fcns(-Kn). 
We  therefore  let  F  in  the  actual  problem  be  equal  to  lsc-fcns(  JZ”)  or  an  appropriate  subset.  The  choice 
of  representing  external  information  by  the  constraint  set  F,  or,  alternatively,  in  the  definition  of  the 
space  of  functions  F,  is  purely  a  technicality. 

3.2  Approximations 

The  actual  problem  (P)  rarely  admits  closed  form  solutions  and  computational  strategies  become  es¬ 
sential.  However,  optimization  over  functions  instead  of  over  a  finite  number  of  parameters  is  not 
implementable  in  a  computational  method.  The  criterion  functional  may  also  involve  integrals  and  the 
number  of  constraints  could  be  infinite,  both  requiring  some  sort  of  approximation.  The  actual  problem 
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Figure  7:  Mesh  and  epi-spline  on  ]R. 

could  also  represent  an  “ideal”  situation  with  full  information  that  should  be  contrasted  with  a  problem 
involving  only  observed  data.  Regardless  of  the  situation,  there  is  a  need  for  considering  approxima¬ 
tions  of  (P) .  We  start  with  defining  approximations  of  lsc  functions  in  terms  of  simpler  functions  called 
epi-splines  that  involve  only  a  finite  number  of  parameters. 

Epi-splines  are  defined  on  Mn ,  but  we  first  present  the  one-dimensional  case  for  simplicity. 
One-dimensional  epi-splines  on  a  mesh.  A  mesh  is  a  finite  number  of  points 

— oo  =  mo  <  mi  <  m2  <  ...,  <  rriN-i  <  mN  =  00 

on  the  real  line  as  illustrated  by  Figure  7.  An  epi-spline 4  of  order  p  on  such  a  mesh  is  a  real-valued 
function  s  that 

on  each  open  interval  (mk-i,rrih),  k  =  1,  -~,N,  is  polynomial  of  degree  p  and 

for  every  x  €  JR,  has  six)  =  min  <  lim  six  +  t),  lim  s(x  —  t) 

{ 40  m 

Figure  7  shows  an  epi-spline  of  second-order,  i.e. ,  p  =  2,  which  on  each  open  interval  is  a 

second-degree  polynomial.  An  epi-spline  of  order  p  =  0  is  a  piecewise  constant  function  and  one  of  order 
p  =  1  is  a  piecewise  affine  function  with  possibilities  of  jumps  at  the  mesh-points  mi,  m2,  tun-i- 
The  second  condition  ensures  that  the  value  s{x)  is  the  smaller  of  the  left  and  right  limits  of  s  at  x, 

4In  [20]  these  epi-splines  are  called  lsc  epi-splines,  but  here  we  exclusively  focus  on  such  epi-splines  and  the  prefix  is 
therefore  superfluous.  Basic  epi-splines  defined  only  on  a  compact  interval  of  1R  is  given  in  [21]. 
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which,  of  course,  exist  since  s  is  continuous  (polynomial)  except  at  the  mesh-points.  This  ensures  that 
s  is  lsc. 

Epi-splines  are  structurally  related  to  splines,  but  are  more  flexible.  In  fact,  interest  in  splines  of  order  p 
with  less  than  the  standard  (p—  l)-times  continuous  differentiability  at  their  “knots”  goes  back  to  Curry 
and  Schoenberg  [9,  10].  However,  mesh-points  mi, . . . ,  mjy-1  should  not  be  identified  with  the  knots  of 
such  splines  since  no  assumptions  are  made  about  information  being  available  at  mesh-points  and  no 
continuity  requirements  are  placed  on  epi-splines  at  (precisely)  these  mesh-points.  In  fact,  the  partition 
can  be  selected  completely  freely.  Continuity  and  any  other  condition  can  still  be  ensured  when  needed 
through  constraints  in  (P),  as  discussed  at  the  end  of  this  section.  Since  from  an  etymological  viewpoint 
the  (Greek)  prefix  “epi”  can  be  interpreted  as  meaning  “higher”  or  “better,”  the  term  epi-splines  is  cer¬ 
tainly  appropriate  for  these  more  “general”  splines.  In  §4  and  §5,  we  see  that  the  analysis  of  epi-splines 
relies  heavily  on  epi-convergence  and  the  epi-topology,  which  provides  further  justification  for  the  name. 

Higher-dimensional  epi-splines.  Epi-splines  in  higher  dimensions  follow  similarly,  but  require  ad¬ 
ditional  notation.  We  denote  by  cl  S  the  closure  of  a  subset  S  of  a  Euclidean  space.  A  finite  collection 
R\,  R2, Rn  of  open  subsets  of  lRn  is  a  partition  of  lRn  if 

N 

U  cl  Rk  =  TRn  and  Rk  C  Ri  =  0  for  all  k  +  1 

k= 1 

There  are  clearly  many  possible  partitions,  but,  in  practice,  those  consisting  of  rectangular  boxes  and 
simplexes  appear  to  suffice.  We  adopt  a  “total  degree”  convention  and  say  that  a  polynomial  in  n 
dimensions  is  of  total  degree  p  if  it  is  expressed  as  a  finite  sum  of  polynomial  terms,  each  having  the  sum 
of  the  powers  of  the  variables  being  no  larger  than  p.  We  note  that  the  total  number  of  terms  in  such 
a  polynomial  is  at  most  (n  +  p)\ / (n\p\) .  Another  convention  would  have  had  only  minor  consequences 
for  the  following  exposition. 

An  epi-spline  s  of  order  p  defined  on  lRn  with  partition  TZ  =  {-RfcjfcLi  is  a  real-valued  function  that 

on  each  Rk,  k  =  1,  is  polynomial  of  total  degree  p  and 

for  every  x  €  ]Rn,  has  s(x)  =  liminf  s(x  ). 

x'—^x 

As  in  the  one-dimensional  case,  an  epi-spline  is  polynomial  on  open  sets.  The  second  condition  ensures 
that  it  is  also  lsc  by  defining  the  value  at  boundary  points  of  the  partition  appropriately.  Figure  8 
illustrates  an  epi-spline  of  order  two  on  TR2 .  The  partition  consists  of  rectangles.  The  family  of  all 
epi-splines  of  order  p  on  TRn  with  partition  TZ  is  denoted  by  e-spl^(7£).  Since  an  s  €  e-spl^(7?.),  with 
TZ  =  {Rk}k=i,  involves  N  polynomials  of  total  order  p,  it  is  fully  characterized  by 

ne  :=  N (n  +  p)\ / (n\p\)  (1) 

parameters,  which  may  be  large  but  finite. 
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Figure  8:  Epi-spline  on  1R2 . 

Approximate  problem.  To  also  allow  for  the  possibility  of  approximations  in  the  criterion  functional 
if  and  the  feasible  set  F ,  we  define  an  approximate  criterion  ipu  and  an  approximate  feasible  set  Fu  C  T . 
Relying  on  epi-splines,  we  define  the  approximate  functions  under  consideration  to  be 

Sv  C  e-splf  (Ku)  n  F. 

The  superscript  u  =  1,2, ...  indicates  that  a  family  of  approximations  can  be  considered,  possibly  with 
gradually  more  refined  partition  1ZU ,  for  example.  These  approximations  lead  to  an 

Approximate  Problem  (P")  '■  min^v(s)  such  that  s  €  Fu  (15" 

that  approximates  the  actual  problem  ( P ). 

(Pu)  is  an  optimization  problem  over  functions,  but  they  are  all  epi-splines  and  therefore  representable 
by  a  finite  number  of  parameters.  Consequently,  (Pu)  is  a  finite- dimensional  optimization  problem  that 
is  solved  by  optimizing  over  the  Nu(n  +  pu)\/(nlpu\)  (real)  parameters  describing  the  family  of  epi- 
splines  under  consideration.  Here,  Nv  is  the  number  of  open  sets  in  the  partition  1ZU  of  Mn.  We  refer 
to  an  optimization  problem  over  such  parameters  as  a  parametric  optimization  problem  corresponding 
to  (Pu)',  see  §5  for  further  discussion.  Under  the  assumption  that  additional  complications  are  removed 
by  approximations  of  the  criterion  functional  and  feasible  set,  the  solution  of  parametric  optimization 
problems  can  be  obtained  by  standard  algorithms,  or  possibly  more  efficiently  by  specialized  algorithms 
in  certain  cases. 
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With  the  introduction  of  approximations,  we  need  to  address  the  possibility  that  solutions  of  approx¬ 
imate  problems  (Pv)  may  not  relate  to  those  of  the  actual  problem  (P).  Under  what  conditions  will 
solutions  of  (Pu)  be  approximate  solutions  of  (P)?  More  fundamentally,  can  we  approximate  any  lsc 
function  by  epi-splines?  These  questions  require  the  discussion  of  approximation  theory  of  optimization 
problems  as  reviewed  in  Section  4,  with  answers  provided  in  Section  5.  We  end  this  section,  however, 
with  examples  of  external  information  and  practical  guidelines. 

3.3  External  Information 

External  information  needs  to  be  converted  into  mathematical  forms  and  included  as  constraints  in  the 
parametric  optimization  problem  corresponding  to  ( Pu ).  The  formulation  of  such  constraints  depends 
on  the  representation  of  the  polynomials  in  an  epi-spline.  Of  course,  there  are  many  possible  represen¬ 
tations,  some  of  which  are  known  to  be  numerically  more  beneficial  than  others.  Here,  we  only  consider 
one-dimensional  epi-splines  and  the  simple  representation 

qk(x)  =  clq  +  akx  +  akx2  +  ...  +  akxp,  x  €  M,  (2) 

of  the  polynomial  describing  an  epi-spline  on  the  interval  (rrik—i,  m*,),  k  =  1,2, ...,  N.  With  N  intervals 
partitioning  ]R ,  the  epi-spline  is  defined  by  N  such  polynomials  with  a  total  of  N(p  +  1)  parameters. 
We  next  give  some  examples  of  external  information  and  the  formulation  of  corresponding  constraints 
using  this  representation;  see  [20]  for  extensions  to  higher  dimensions  and  [22]  for  constraints  related 
to  density  estimation. 

Continuity.  An  epi-spline  is  continuous  if  the  polynomials  q1 ,  q2 , ...,  qN  defining  the  epi-spline  coincide 
at  the  mesh-points  mi,  m2,  ...,  mjv-i-  This  is  easily  achieved  by  the  constraints 

Oq  +  a\mk  +  a%ml  +  ...  +  akvmpk  =  aQ+1  +  a\+1mk  +  ak+1mk  +  ...  +  ak+1mpk,  k  =  1,2, ...,  N  -  1. 

Of  course,  this  constraint  could  be  enforced  only  at  a  subset  of  the  mesh-points  to  allow  for  disconti¬ 
nuities  in  some  areas,  but  not  in  others  as  done  in  Figure  2. 

Continuous  differentiability.  An  epi-spline  is  continuously  differentiable  if  continuous,  ensured  by 
the  above  equations,  and  the  derivatives  of  the  polynomials  q1 ,  q2 , ...,  qN  coincide  at  the  mesh-points 
mi,  m2,  ...,  mjv-i-  This  is  achieved  by  the  constraints 

a\  +  la^rrik  +  ...  -j-papmf-1  =  ak+1  +  2ak+lmk  +  ...  +  pak+lmPjTl ,  k  =  1,2, ...,  N  —  1, 

where  we  assume  that  p  >  1.  Continuous  differentiability  is  automatic  for  a  continuous  epi-spline  of 
order  p  =  0. 

Fixed  values.  We  ensure  that  an  epi-spline  satisfies  the  function  value  s(x)  on  a  set  Sk  C  {mk-\,mk) 
by  the  constraints 

ao  +  a\x  +  akx 2  +  ...  +  akxp  =  s(x),  x  €  Sk- 
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This  may  at  first  give  the  impression  that  an  infinite  number  of  constraints  is  needed.  However,  if  S k 
has  more  than  p  +  1  distinct  points,  then  it  suffices  to  select  p+1  distinct  points  at  which  to  enforce  the 
constraints.  This  follows  from  the  fact  that  a  polynomial  of  degree  p  is  uniquely  defined  by  its  value  at 
p+1  points.  The  equality  can  naturally  be  replaced  by  inequality  if  only  bounds  are  available.  More¬ 
over,  if  the  values  of  an  epi-spline  is  of  no  interest  beyond  a  certain  mesh-point,  then  the  polynomials 
defining  the  epi-spline  beyond  that  point  is  of  course  immaterial  and  can  be  ignored. 

Monotonicity.  We  ensure  that  an  epi-spline  of  order  p  >  1  is  nonincreasing  by  the  constraints 

a\  +  2a\x  +  ...  +  pa*xp~l  <  0,  x  €  (mk-i,  m*,),  k  =  1, 2, ...,  N,  and 
Oq  +  airrik  +  a\m\  +  ...  +  a^mFk  >  Oq+1  +  af[+lmk  +  a2+1m|  +  ...  +  ap+1mp,  k  =  1,2, ...,  N  —  1. 

The  first  set  of  constraints  ensures  that  the  polynomials  making  up  an  epi-spline  are  nonincreasing  on 
the  intervals  on  which  they  define  the  epi-spline.  The  second  set  imposes  the  restriction  that  when 
moving  from  left  to  right,  the  epi-spline  must  not  jump  up  at  mesh-points.  The  second  requirement  is 
automatically  satisfied  for  continuous  epi-splines.  In  the  case  of  p  =  2,  the  first  condition  simplifies  to 
the  two  constraints 

a\  +  2a2?7ifc_i  <  0  and  a\  +  2 a^mk  <  0 

as,  in  that  case,  the  derivative  of  the  epi-spline  is  simply  linear.  For  any  p,  the  first  and  the  second 
conditions  hold  if 

a\  +  2a2mk  +  ...  +  paFvmvkX  <  0 

under  the  additional  assumption  that  the  epi-spline  is  convex  since  then  the  epi-spline  is  continuous 
with  a  nondecreasing  (left/right-sided)  derivative. 

Convexity.  We  ensure  that  an  epi-spline  is  convex  by  the  continuity  constraints  above  and 

2a|  +  ...  +  p(p  —  l)a,pXp-2  >  0,  x  €  (mk_i,rrik),k  =  1,  2, ...,  N,  and 
a\mk  +  2a2mk  +  ...  +  pa^mF^1  <  a^+1m,fc  +  2a2+1mk  +  ...  +  pap+1m|_1,  k  =  1,2,  ...,1V  —  1. 

The  first  set  of  constraints  ensures  that  the  second-order  derivative  of  the  polynomials  constituting  an 
epi-spline  are  nonnegative  and  therefore  convex  on  the  intervals  on  which  they  define  the  epi-spline. 
For  p  <  2,  the  constraints  are  superfluous.  The  second  set  imposes  the  restriction  that  when  moving 
from  left  to  right,  the  derivative  of  the  epi-spline  must  not  jump  down  at  mesh-points.  The  second 
requirement  is  automatically  satisfied  for  continuously  differentiable  epi-splines  and  can  be  ignored  for 
the  case  p  =  0.  If  an  epi-spline  is  of  second  order,  then  the  first  set  of  constraints  simplifies  to  >  0 
for  k  =  1, ...,  N.  If  p  =  3,  the  first  set  of  constraints  simplifies  to 

2a|  +  603777.^-1  >  0  and  2a^  +  6a^nik  >  0 

because  then  the  second-order  derivative  is  simply  linear. 
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Log-concavity  and  modality.  We  recall  that  the  composite  function  h(x)  =  e~s(x\x  €  Mn  is 
log-concave  if  and  only  if  s  is  convex.  Consequently,  it  can  be  beneficial  to  use  such  an  exponential 
transformation  of  an  epi-spline  when  log-concavity  is  desirable  as  in  the  case  of  probability  density  esti¬ 
mation;  see  [22],  In  this  case,  the  convexity  constraints  above  would  ensure  log-concavity.  Log-concavity 
implies  unimodality.  However,  the  converse  is  not  true.  It  is  more  complicated  to  ensure  unimodality 
and  not  necessarily  log-concavity.  In  the  case  of  a  continuous  function,  one  could  ensure  unimodality 
by  designating  one  mesh  point  mw  as  the  mode,  and  then  constraining  the  function  to  be  increasing 
and  decreasing  on  (mw-i,  mv)  and  (m*/,  respectively,  and  nondecreasing  on  [mo,m^_i)  and 

nonincreasing  on  (ruk'+i,  mjv]-  Solving  the  resulting  problem  gives  a  unimodal  candidate  function.  The 
process  is  repeated  for  alternative  mode  locations  mk,  k  =  0,1,...,  IV,  k  7^  k' ,  and  the  function  with 
the  best  criterion  functional  is  retained  as  the  optimal  function.  iV-modality  is  achieved  similarly  by 
partitioning  into  K  intervals,  with  each  having  a  unimodal  constraint.  The  process  must  be 

repeated  for  each  partition  of  interest.  To  specify  that  certain  are  modes  is  achieved  by  ensuring 
that  the  function  is  increasing  and  decreasing  on  (mk~i,mk)  and  {rrik,mk+ 1),  respectively. 

We  note  that  the  above  constraints,  and  many  similar  ones,  are  mostly  linear  in  the  parameters  describ¬ 
ing  an  epi-spline.  Consequently,  their  inclusion  in  a  parametric  optimization  problem  corresponding  to 
(Pu)  requires  little  additional  computational  effort. 

3.4  Implementation  Guidelines 

The  formulation  and  implementation  of  an  approximate  problem  (Pu)  in  a  specific  context  require 
the  choice  of  order  p  and  partition  7 Z.  The  order  is  usually  easy  to  choose  as  the  extensive  experi¬ 
ence  with  classical  polynomial  splines  over  more  than  half  a  century  strongly  indicates  that  low-order 
splines  are  preferred  to  higher  order  ones  as  they  avoid  the  oscillatory  behavior  of  high-degree  polyno¬ 
mials.  Typically,  order  two  or  three  is  recommended.  However,  the  consideration  of  derivatives  might 
dictate  slightly  higher  orders.  For  example,  external  information  about  a  k- th  derivative  of  the  actual 
function  dictates  that  at  least  order  k  or  maybe  k  + 1  needs  to  be  used;  see  [21,  20]  for  further  discussion. 

The  choice  of  mesh  (in  one  dimension)  is  usually  straightforward;  one  should  in  view  of  the  following 
theory  select  a  mesh  as  fine  as  is  computationally  possible.  However,  when  computing  times  are  im¬ 
portant  or  external  information  indicates  that  the  actual  function  is  “nearly  polynomial,”  one  could 
consider  coarse  meshes.  The  exact  location  of  mesh  points  could  be  dictated  by  external  information, 
for  example  about  points  of  discontinuity.  When  no  relevant  information  is  available,  a  choice  with 
evenly  dispersed  mesh  points  is  natural.  The  situation  in  higher  dimensions  is  more  complex  with  the 
multitude  of  possible  partitions.  Unless  special  external  information  is  available,  convenience  suggests 
polyhedral  partitions,  especially  boxes  and  simplexes.  In  particular,  simplex  partitions  appear  versatile 
as  they  are  compatible  with  continuous  epi-splines  in  all  dimensions  even  for  order  p  =  1;  see  [20]. 

Another  issue  is  the  representation  of  polynomials,  with  (2)  being  only  one  possibility.  In  fact,  we  know 
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Figure  9:  Epi- graph  of  a  function  /  on  1R. 

that  this  representation  leads  to  poorly  conditioned  problems  in  certain  contexts  of  interpolation  and 
fitting.  Although  we  have  found  no  need  for  going  beyond  this  representation,  or  one  that  is  normalized 
with  respect  to  mesh  point  locations,  in  the  applications  examined  thus  far,  we  foresee  that  efficiencies 
will  accrue  from  further  studies  of  this  area,  especially  in  higher  dimensions. 

4  Background  on  Epi-Convergence 

The  examination  of  epi-splines  and  the  relationship  between  the  actual  problem  (P)  and  approximate 
problems  (Pu)  rely  on  variational  analysis  and,  especially,  the  approximation  theory  of  optimization 
problems  with  the  notion  of  epi-convergence  taking  center  stage. 

In  this  context,  it  is  not  possible  to  rely  on  the  standard  pointwise  and  uniform  convergence  notions. 
Pointwise  does  not  guarantee  the  convergence  of  minimizers,  e.g.,  /"  =  1  on  [0,1]  except  for  fu(  1/zz)  =  0 
converges  pointwise  to  /  =  1  on  [0,1],  but  clearly  the  minimizers  of  the  fv  do  not  converge  to  the  set  of 
minimizers  of  /,  which  is  [0, 1].  On  the  other  hand,  uniform  convergence  will  guarantee  the  convergence 
of  the  minimizers,  but  can,  at  best,  only  be  satisfied  when  the  constraint-set  is  not  affected  by  the 
approximation (s),  e.g.,  fv  =  0  on  [0,1  —  l/v\  does  not  converge  uniformly  to  /  =  0  on  [0,1].  One 
can  view  epi-convergence  as  a  one-sided  (unilateral)  uniform  convergence  notion  that  can  handle  the 
challenges  posed  by  an  optimization  problem  and  guarantees,  in  a  way  to  be  made  precise  a  bit  later, 
the  convergence  of  the  minimizers.  Here,  we  provide  the  essential  components;  see  [18,  4]  for  a  more 
comprehensive  treatment. 
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Figure  10:  Illustration  of  distances:  dp(f,g)  =  <5  and  dp(f,g)  =  7. 

We  view  a  function  on  Kn  through  its  epi-graph 

epi  /  :=  {(x,x0)  €  Rn+1  \  f{x)  <  x0}, 

i.e.,  the  set  of  points  in  ]Rn+l  that  lies  no  lower  than  the  graph  of  /,  as  illustrated  in  Figure  9.  We  note 
that 

/  €  lsc-fcns(J?n)  if  and  only  if  epi  /  is  a  nonempty  closed  subset  of  Mn+1. 

Distances  between  such  functions  can  therefore  be  based  on  distances  between  (closed)  sets  in  Mn+1. 

We  start  with  notation  and  let  plB  =  lB(0,p)  be  the  closed  ball  in  a  Euclidean  space  centered  at  the 
origin  with  radius  p  >  0  and 

d{y,S)  :=  inf  \\y  —  y'\\ 
y'es 

be  the  standard  distance  between  a  point  y  and  a  subset  S  of  a  Euclidean  space.  We  use  the  notation 
A  +  B  =  {a +  b\  a  €  A,  b  €  B}  for  the  Minkowski  sum  of  two  sets  A  and  B  in  a  Euclidean  space. 

The  classical  Pompeiu-Hausdorff  distance  between  sets  C,  D  C  Mn, 

doo(C,  D )  :=  sup  | d(y,  C )  —  d(y,  D)\  =  inf {77  >  0  |  C  C  D  +  y]B,  D  C  C  +  ylB} 

y&Rn 

is  not  useful  in  the  present  context  as  epi-graphs  are  unbounded  sets  and  this  distance  is  easily  infin¬ 
ity.  We  therefore  turn  to  two  distances  that  depart  from  the  Pompeiu-Hausdorff  distance  in  different 
directions  (see  [2]  and  [18,  §4.H] ) . 
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Focusing  now  on  epi-graphs,  we  first  define  the  p-epi-  distance  between  two  functions  /  and  g  on  Mn, 
which  is  given  by  a  restriction  of  the  first  formula  for  the  Pompeiu-Hausdorff  distance: 

dp(f,g)=  sup  |d(x,epi/)  -  d(x,epig)\,  p>  0. 

x£pB 

Obviously,  dp(f,g)  =  doc(epi /,  epi  <7)  for  p  =  00.  As  illustrated  in  Figure  10,  dp(f,g )  gives  the  largest 
difference  in  distance  between  a  point  x  =  (x,  xq)  €  plB  C  !Rn+1  and  the  respective  epi-graphs.  In  this 
figure,  with  /(x)  =  p  —  7  for  x  >  —p  —  e  and  f(x)  =  —p  for  x  <  —  p  —  e;  g(x)  =  p  for  x  >  —p  —  5  —  e 
and  p(x)  =  —p  for  x  <  —p  —  5  —  e;  and  <5  +  e  <  p  and  e,  7  small,  the  point  x  =  (— p,  0)  provides  the 
maximizing  point,  with  d(x,epi/)  =  e  and  d(x,epig)  =  5  +  e.  Consequently,  dp(f,g )  =  5. 

The  second  distance,  the  p-epi- distance  estimate ,  relies  on  the  second  formula  for  the  Pompeiu-Hausdorff 
distance,  but  limits  the  focus  to  the  ball  centered  at  the  origin  with  radius  p: 

dp(f ,  g)  =  inf { 77  >  0  |  epi  /  0  plB  C  epi  g  +  glB,  epi  g  0  plB  C  epi  /  +  glB},  p>  0. 

Again,  dp(f,g )  =  doo(epi /,  epi 5)  for  p  =  00.  Figure  10  illustrates  dp,  which  involves  “padding”  the 
first  epi-graph  so  that  it  includes  the  second  one,  and  vice  versa.  In  this  figure,  epi  g  n  plB  contains 
the  single  point  (0,p)  which  is  clearly  contained  in  epi/,  and  epi g  only  needs  to  be  padded  with  7  to 
ensure  that  epi  /  n  plB  C  epi  g  +  7 IB.  Consequently,  dp(f,g )  =  7. 

Neither  dp  nor  dp  is  a  metric  on  lsc-fcns(J?n).  The  p-epi-distance  dp  is  a  pseudo- metric,  but  dp  is  not. 
Obviously,  if  plB  is  “small,”  then  dp(f,g )  may  very  well  be  zero  without  /  and  g  being  equal.  It  is 
apparent  that  all  values  of  p  need  to  come  into  play.  This  is  accomplished  in  the  epi- distance 

r  00 

d(f,g)  ■■=  /  dp(f,g)e~pdp, 

Jo 

which  then  leads  to  the  following  fact: 

d  is  a  metric  on  the  space  lsc-fcns {Mn). 

We  say  that  functions  fv  €  lsc-fcns (Mn)  epi-converges  to  a  function  /  €  lsc-fcns(J?n)  if  d(f^,f)  — »  0. 
Epi-convergence  is  also  characterized  by  dp  and  dp : 

For  functions  fu,  f  G  lsc-fcns (Mn)  and  p  >  0,  the  following  is  equivalent: 

d(r,/)^o 

dp{fu,f )  -1  0  for  p  >  p 
dP(fU,f)  0  for  p>  p. 

The  epi-distance  induces  the  epi-topology  (in  topology  circles  referred  to  as  the  Attouch-Wets  topology) 
on  lsc-fcns (Mn).  In  fact,  by  Theorem  7.58  of  [18],  (lsc-fcns^oo (!Rn),d)  is  a  complete  metric  space.  It  is 
also  separable  and  therefore  a  Polish  space,  with,  in  fact,  epi-splines  given  in  terms  of  polynomials  with 
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rational  coefficients  furnishing  a  countable  dense  subset  [20].  However,  lsc-fcns (Mn)  is  not  a  vector 
space5  as  — /  0  lsc-fcns(J?n)  for  /  €  lsc-fcns(J?n)  given  by  f(x)  =  0  if  x  G  [0,oo)n  and  f(x)  =  1 
otherwise.  We  note  however  that  it  is  a  cone,  i.e. ,  A /  €  lsc-fcns(.ff?n)  for  A  >  0  and  /  €  lsc-fcns (J?n). 
Moreover,  lsc-fcns(^?n)U{oo}  is  convex,  where  we  have  included  the  function  that  is  infinity  everywhere. 

A  parallel,  slightly  more  general,  development  focusing  on  nonempty  closed  sets  in  Mn,  instead  of 
epi-graphs  of  lsc  functions,  is  also  possible  and  leads  to  set  convergence  in  the  sense  of  Painleve- 
Kuratowski.  Such  convergence  is  characterized  by  d,  dp,  and  dp,  similar  to  the  above  characterization 
of  epi-convergence,  with  the  slight  change  in  definition  of  these  distances  consisting  of  replacing  epi¬ 
graphs  by  closed  sets;  see  [18,  Chapter  4]  and  [6]  for  further  details  about  topologies  on  closed  sets. 

Connections  between  the  various  distances  are  provided  by  [18,  Exercise  7.60].  For  example,  for  func¬ 
tions  f,g  on  Mn  not  identical  to  oo, 

(i)  dp(f,g)  <  dp(f,g )  <  dp>(f,g),  when  p'  >  2p  +  max{d(0,  epi  /),  d(0,  epi#)} 

(h)  d(f,g)  >  (1  -  e-p)|d(0,  epi  /)  -  d(0,epi#)|  +  e~pdp(f,g) 

(hi)  d(f,g)  <  (1  -  e~p)dp(f,g )  +  e"p(max{d(0,  epi  /,  d(0,  epi  g)}  +  p+  1) 

(iv)  dp(f,g )  =  dp(f,g)  for  any  p  >  0  if  f,g  convex  with  /( 0)  <  0  and  g( 0)  <  0. 

As  is  clear  from  these  results,  the  distances  are  tied  to  the  origin  of  Mn+1.  Although  formulae  for 
other  “anchor”  points  are  also  possible,  the  simplest  way  of  utilizing  these  estimates  is  to  translate  the 
functions  /  and  g  favorably.  Item  (iv)  highlights  the  possibilities  in  that  direction  in  the  case  of  convex 
functions. 


It  is  apparent  that 


and  since  J0°°  e  Pdp 


dp{f,g)  <  ]]/ -slloo  :=  sup  \  f(x)-g(x)\ 

x£Rn 


1,  that  also 


d(f,g)  <  ]]/  -  tfHoo 


Of  course,  with  the  possibilities  of  the  function  values  — oo  and  oo,  the  right-hand  sides  may  easily  be 
oo  making  reliance  on  ||  •  ||oo  impractical.  Even  for  hnite- valued  functions,  it  is  clear  that  convergence  in 
the  epi-distance  does  not  imply  convergence  in  ||  •  Hoc  as  the  following  simple  example  shows.  Suppose 
that  fu(x)  =  0  if  x  <  0  and  fu(x)  =  x/v  otherwise.  We  also  define  f°(x)  =  0  for  all  r  €  fi;  see 
Figure  11.  Clearly,  \\fv  —  /°||oo  =  oo  for  all  n.  However,  for  any  p  >  0,  dp(f^,f0)  <  p/u  and  therefore 
d(fu,  f°)  <  {l/v)  f0°°  pe~pdp  =  l/v.  Consequently,  d(fu,  f°)  ->•  0,  but  \\fu  -  /°|| ^  y4  0. 


5We  define  addition  of  functions  and  multiplication  with  a  scalar  in  the  usual  “pointwise”  manner.  To  handle  extended 
real- values,  we  adopt  the  conventions  that  oo  +  a  =  oo  and  — oo  +  a  =  — oo  for  a  G  R,  oo  +  oo  =  00  + (—oo)  =  —00  +  00  =  00, 
A  •  00  =  —00  for  A  <  0,  0  ■  00  =  0,  and  A  •  00  =  00  for  A  >  0,  and  similarly  for  A  •  (—00). 
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Pointwise  convergence  of  a  sequence  of  functions  is  also  not  equivalent  to  epi-convergence  even  if  both 
pointwise  and  epigraphical  limits  exist  as  the  following  simple  example  shows.  Suppose  that  fu(x )  =  0 
if  x  =  \/v  and  fu(x)  =  1  otherwise.  Clearly,  the  pointwise  limit  of  /",  i.e. ,  the  function  defined 
by  linij,  fu(x)  for  every  x ,  is  the  function  that  is  unity  everywhere.  In  contrast,  fu  epi-converges  to 
the  function  f°  with  f°(x)  =  0  for  x  =  0  and  f°(x)  =  1  otherwise,  i.e.,  d(fu,f°)  — >•  0.  In  fact, 
dp(fuJ°)  =  cip(fu1  f°)  =  1/u;  see  Figure  12. 

Specific  estimates  for  epi-splines  are  provided  in  [20]:  For  s,  s'  €  e-spl^(7£),  with  1Z  =  one  has 

for  any  p>  0, 


(i)  d(s,s')  <  max  sup  | s(x)  —  sVx)! 

k=l,...,N  xGRk 

(ii)  dp(s,s')  <  max  sup  |s(a:)  —  s;(a:)| 

k=i,...,N  xeRk 

(iii)  dp(s,  s')  <  max  dp(sk,s'k). 

k=l,...,N 

These  results  provide  a  quantification  of  epi-convergence  of  functions  in  lsc-fcns(J?n).  However,  we  need 
to  go  one  step  further  and  define  epi-convergence  of  optimization  problems  defined  over  such  functions. 

We  next  provide  a  definition  of  epi-convergence  of  the  approximate  problems  (Pv)  to  the  actual  problem 
(P).  In  view  of  the  above  results,  these  are  problems  defined  on  the  metric  space  (lsc-fcns (Mn),d)  and 
we  therefore  follow  [3]: 

A  sequence  {(Pu)}u&iy  epi-converges  to  (P)  if 
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Figure  12:  Illustration  of  epi-convergence  and  pointwise  convergence. 


(i)  for  every  sequence  {sl'}u&k,  with  K  an  infinite  subsequence  of  the  positive  integers,  su  €  Fu  C\SU, 
and  <i(s1',/)  — >  0,  we  have  that  /  G  F  and  liminfj,  i^u{su)  >  -*/>(/); 

(ii)  for  every  f  £  F,  there  exists  a  sequence  {su}^=l,  with  sv  £  Fv  n  Su,  such  that  d(su ,  /)  — >  0  and 
limsupy  ijjv{sv)  <  ip(f). 

The  main  consequence  of  epi-convergence  in  this  context  is  the  fact  that  it  implies  that  solutions  of  the 
approximate  problems  tend  to  those  of  the  actual  problem.  Let  V  and  Vv  be  the  minimum  values  of 
(P)  and  ( Pv ),  respectively.  Specifically,  by  [3,  Theorem  2.5], 

if  (Pv)  epi-converges  to  (P),  sk  minimizes  ( PVk ),  and  d(sk ,  /)  — >•  0,  then  /  is  a  minimizer 
of  (P)  and  lirrifc  VUk  =  V. 

This  result  therefore  provides  a  direct  path  to  an  answer  to  the  earlier  question  whether  a  solution 
of  an  approximate  problem  (Pu)  would  be  an  approximate  solution  of  (P):  one  needs  to  ensure  epi- 
convergence  of  ( Pu )  to  (P). 

5  Theory 

A  series  of  results  are  available  about  the  relationship  between  (Pu)  and  (P)  and  their  solutions  as 
well  as  between  epi-splines  and  lsc  functions.  Results  in  the  context  of  probability  density  estimation 
are  established  in  [22].  We  refer  to  [21]  for  other  cases  with  one-dimensional  functions  and  [20]  for 
higher  dimensions.  Here,  we  give  three  central  results  dealing  with  epi-convergence  of  ( Pu )  to  (P),  the 
ability  of  epi-splines  to  approximate  to  an  arbitrary  level  of  accuracy  any  lsc  function,  and  the  rate  of 
convergence  of  corresponding  solutions. 

Epi-convergence.  A  sufficient  condition  for  epi-convergence  of  (Pv)  to  the  actual  problem  (P)  relies 
on  continuous  convergence  of  the  approximate  criterion  i\jv  to  the  actual  criterion  i/j.  Often,  i\)v  is  simply 
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identical  to  for  all  v.  For  the  general  case,  we  recall  that  i/ju  converges  continuously  to  ^  relative  to 
F  if  for  every  /  €  F  and  sequence  {sI/}£T1,  with  sv  €  F  and  d(su,  f)  — »•  0,  We  use  the 

notation  inti7  to  denote  the  interior  of  F  C  F  (with  respect  to  the  epi-topology) .  We  establish  in  [20] 
that  if 

(i)  converges  continuously  to  -0  relative  to  F 

(ii)  U’^L1SU  is  dense  in  F 

(iii)  Fu  set-converge  to  F  =  cl(intF), 
then  (Pv)  epi-converges  to  (P). 

The  first  condition  implies  that  is  continuous  with  respect  to  the  epi-topology.  The  second  condition 
relates  to  the  ability  of  epi-splines  to  approximate  arbitrary  lsc  functions,  which  we  address  in  the 
next  paragraph.  The  last  part  of  the  third  condition  is  a  constraint  qualification  that  avoids  “isolated” 
feasible  points  that  cannot  easily  be  approximated.  Numerous  simplification  occurs  if  F  C  e-spl([  (7£) 
for  some  partition  1Z  and  order  p,  which  implies  that  both  ( Pv )  and  (P)  are  finite-dimensional,  as 
discussed  at  the  end  of  the  section. 

Approximation  by  epi-splines.  To  allow  for  approximation  of  arbitrary  lsc  functions,  we  consider 
a  sequence  of  partitions  that  are  gradually  refined.  Specifically,  we  say  that  a  sequence  of 

partitions  of  lFLn,  with  1ZV  =  {Rk}kl1:  is  an  infinite  refinement  if 

for  every  x  €  Mn  and  e  >  0,  there  exists  a  positive  integer  v  such  that 
Rvk  C  ]B(x,e)  for  every  v  >  v  and  k  satisfying  x  €  cl Rfik. 

A  simple  example  of  an  infinite  refinement  on  M  is  to  take  Nv  =  2v  +  2,  R\  =  (— oo,  —  y/v),  Rk  = 
(( k  —  v—  2)/y/v,  [k  —  v  —  1)/ y/u)  for  k  =  2, 3, ...,  2u  +  1,  and  R^u+2  =  °°)-  Then  u  >  max{x2,  e”2} 

satisfies  the  above  condition.  Obviously,  much  flexibility  exists  in  constructing  such  infinite  refinements. 

A  main  result  in  [20]  is  then  that 

for  any  nonnegative  integer  p  and  an  infinite  refinement  on  Mn, 

OO 

|^J  e-spl^T^17)  is  dense  under  the  epi-topology  in  lsc- fens (JRn). 

u=\ 


Consequently,  epi-splines,  even  those  of  order  zero,  can  approximate  to  an  arbitrary  level  of  accuracy 
any  lsc  function.  The  optimization  over  epi-splines  in  (Pu)  is  therefore  justified. 

Rate  of  convergence.  Solutions  of  ( Pu )  tend  to  those  of  (P)  at  rates  that  depend  on  several  factors 
and  a  full  analysis  along  the  lines  of  [5]  and  references  therein  is  beyond  the  scope  of  this  tutorial.  Here, 
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we  consider  the  case  in  which  T  C  e-spl^(72.)  for  some  partition  1Z  =  {Rk}k=i  and  order  p,  i.e. ,  the 
actual  problem  (P)  is  finite  dimensional.  Of  course,  in  view  of  the  flexibility  of  epi-splines,  there  may 
be  little  to  lose  from  a  practical  perspective  to  adopt  this  simplification  from  the  outset.  In  view  of  the 
finite  number  of  parameters  describing  an  epi-spline,  (P)  corresponds  then  to  a  parametric  optimization 
problem 

Parametric  Actual  Problem  (P)  :  min  ip(r)  such  that  r  €  R  C  JRUe, 

where  R  is  a  feasible  region  corresponding  to  the  constraint  set  F,  ip  is  a  criterion  function  expressed  as 
a  function  of  the  parameters  r,  and  ne  is  defined  in  (1).  If  every  s  €  e-spl^(P)  is  of  the  form  s  =  (c(-),  r), 
with  r  €  Mne  and  c  =  (ci,C2,  ...,cne)  a  set  of  basis  functions6  for  the  corresponding  polynomials,  then 
ip(r)  =  ip((c(-),r))  and  R  =  {r  €  lRne  |  (c(-),r)  €  F}. 

Similarly,  a  parametric  optimization  problem  corresponding  to  ( Pu )  takes  the  form 

Parametric  Approximate  Problem  (Pv)  :  mint/P (r)  such  that  r  €  Ru , 

where  Ru  =  {r  €  JRne  |  (c(-),r)  €  FunSu}  and  ipv{r)  =  ipu((c(-),  r)).  We  quantify  next  the  relationship 
between  solutions  of  (Pu)  and  (P). 

We  focus  here  on  near-optimal  solutions  of  (Pu)  that  are  indeed  the  solutions  provided  by  numerical 
solvers.  A  benefit  of  such  an  approach  is  that  we  avoid  quantifying  the  “conditioning”  of  the  problems’ 
near-optimal  solutions;  see  [18,  Section  7.J]  for  results  in  that  direction.  We  therefore  define  the  e- 
optimal  solutions  for  ( Pv ),  which  for  e  >  0,  are  given  by 

R"  =  {r£Ru  |  ipu(r)  <VU +  £}, 

with  Vu  being  the  optimal  value  of  (Pv).  We  also  say  that  a  function  ip  is  Lipschitz  on  plB  with  constant 
k  if 

| tp(r)  —  <p(r') |  <  n\\r  —  r'\ |  for  all  r,  r'  €  plB. 

A  quantitative  approximation  result  can  then  be  deduced  from  Theorem  7.69  and  Example  7.62  in  [18]: 


Suppose  that  ip, ip"  are  finite  convex  functions,  R,RU  are  closed  convex  sets,  and  po  € 
(0,  oo)  is  such  that  there  exists  r,rv  €  poJB  optimal  in  (P)  and  ( Pv ),  respectively,  and 
ip{r),ip1' (rv)  >  —po-  Let  p  >  po,  p ’  >  dp(R,Ru)  +  p,  and  assume  that  ip,  ip v  are  Lipschitz 
on  p'lB  with  constant  k.  Then,  for  any  e  >  0  and  r°  optimal  in  (P), 


d(r°,K)< 


ma  x{ip(r)  —  ipu{r)}  +  (k  +  1  )dp(R1  Rv) 

r£pB 


We  observe  that  an  error  between  an  optimal  solution  of  (P)  and  originates  from  two  sources:  the 
differences  between  the  criterion  functions  ip  and  ipu  and  between  the  restrictions  imposed  through  R 
and  Ru  measured  in  terms  of  the  dp-distance,  now  defined  for  arbitrary  sets  and  not  only  epi-graphs. 

6There  are  several  possible  basis  functions  with  (1,  x,  x2, xp )  for  each  interval  (rrik-i,  rrik)  furnishing  one  example  on 
Jt;  see  (2). 


25 


£= 

o 

o 

c 

<D 

E 


(a) 


(b) 


Figure  13:  Image  reconstruction  example:  actual  function  (a)  and  epi-spline  (b). 


6  Examples 

We  provide  numerical  results  for  three  gradually  more  complex  examples.  The  first  example  estimates  a 
common  test  function  from  image  reconstruction.  The  second  example  estimates  the  probability  density 
of  simulation  output.  The  third  example  forecasts  electricity  (load)  demand  on  the  basis  of  weather 
predictions. 

Image  reconstruction.  We  consider  the  standard  test  function  f(x)  =  (cos(7rxi)  +  cos(7tx2))3  defined 
on  [ — 3, 3] 2 ;  see  Figure  13(a).  We  rely  on  continuous  epi-splines  and  a  partition  of  [ — 3, 3]2  consisting 
of  20  by  20  squares  so  that  N  =  400.  Based  on  900  uniformly  distributed  data  points  and  the  max 
absolute  error  criterion,  we  obtain  the  epi-spline  fit  of  Figure  13(b).  Maximum  and  average  errors  across 
the  data  points  are  0.415  and  0.313,  respectively.  Mean  square  and  mean  absolute  error  criteria  give 
similar  results.  A  relaxation  of  the  continuity  requirement  results  in  essentially  perfect  interpolation  at 
the  expense  of  a  more  “rugged”  fit. 

Simulation  output  estimation.  Estimates  of  the  probability  density  function  of  the  output  of  sim¬ 
ulation  model  provide  a  comprehensive  picture  of  the  performance  of  the  system  modelled.  Although 
kernel  methods  and  other  traditional  methods  of  nonparametric  statistics  compute  density  estimates, 
they  rely  heavily  on  large  sample  sizes  and  only  in  parts  can  account  for  external  information,  a  preva¬ 
lent  factor  in  practical  simulation  studies.  As  laid  out  in  [22],  probability  density  estimation  using 
epi-splines  offers  the  possibility  to  include  an  arbitrary  collection  of  external  information  and  thereby 
achieve  high-quality  estimates  even  for  small  sample  sizes. 

For  a  sample  A'1,  A2,  ...,XU  that  is  independently  and  identically  distributed  according  to  an  unknown 
density,  it  is  well-known  (see  for  example  [22])  that  a  constrained  maximum  likelihood  estimator  of  the 
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Figure  14:  Density  of  queuing  simulation, 
density  is  an  optimal  solution  of 

1  A.  r 

max  —  n  h[Xl)xlv  such  that  h>  0,  /  h(x)dx  =  l,h  €  H, 

u  »= l 

where  74  is  an  appropriately  selected  space  of  functions  on  Passing  through  the  transformation 
h  =  we  arrive  after  equivalently  maximizing  the  logarithm  of  the  objective  function  of  the  problem 

1  .  f 

min  —  ^  f(Xl)  such  that  /  e~^x'>dx  =  1  ,/gF, 

^  i=i  •' 

where,  as  above,  we  let  J7  C  lsc-fcns(J?").  Since  this  problem  is  infinite-dimensional,  we  formulate  the 
approximate  problem 

1  ^ 

min  —  ^2  s(Xl )  such  that  s  €  Fu  C\SU 
i= 1 

over  epi-splines,  where  5"  =  e-spl^(72.)  H  J7,  as  above,  and  Fu  is  a  subset  of  functions  satisfying 
f  e~s^dx  =  1,  but  also  satisfying  other  restrictions  derived  from  external  information.  A  solution  s  of 
the  approximate  problem  provides  a  density  estimator  through  the  composition  e~s ,  where  we  observe 
that  the  nonnegativity  is  automatically  satisfied. 

An  example  taken  from  [24]  illustrates  the  possibilities.  We  consider  an  M/M/1  queue  with  arrival  rate 
A  =  1  and  service  rate  /r  =  1.5,  but  50%  of  customers  who  enter  the  system  are  held  at  a  separate 
station  for  two  time  units  before  entering  the  queue.  We  wish  to  estimate  the  density  of  the  customer 
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time-in-service  (TIS).  The  actual  density  is  an  equal  mixture  of  the  density  corresponding  to  an  M/M/1 
queue  without  the  holding  station,  which  is  exponential,  and  the  same  density  shifted  to  the  right  by 
two  time  units.  The  resulting  density  is  discontinuous  at  two  time  units  as  half  the  customers  will  have 
two  time  units  added  to  their  time  in  system;  see  the  dotted  line  in  Figure  14. 

We  assume  that  this  detailed  information  about  the  density  for  customer  TIS  is  unavailable  and  that 
we  must  rely  on  100  simulated  customer  TIS  values  as  well  as  reasonable,  external  information.  Of 
course,  in  practice  this  is  the  common  situation.  We  build  a  second-order  epi-spline  estimate  by  solving 
the  above  approximate  problem,  under  the  external  information  that  the  customer  TIS  is  nonnegative, 
derivatives  relative  to  density  values  are  bounded  (i.e.,  a  bound  on  the  pointwise  Fisher  information), 
second-order  derivatives  are  bounded,  and  the  upper  tail  is  log-concave.  Using  a  mesh  consisting  of 
N  =  10  intervals,  we  obtain  the  estimate  given  by  the  solid  line  in  Figure  14.  The  picture  is  representa¬ 
tive  of  hundreds  of  draws  of  100  simulated  customer  TIS.  We  also  illustrate  a  standard  Gaussian  kernel 
estimate.  The  stems  indicate  the  100  simulated  customer  TIS.  The  estimate  based  on  epi-splines  tracks 
the  actual  density  closely  and,  in  contrast  to  the  kernel  estimate,  captures  its  essence.  From  this  and 
similar  examples  in  [24] ,  we  conclude  that  epi-spline  technology  provides  a  tool  for  analyzing  simulation 
output  under  a  variety  of  external  information.  In  particular,  flexibility  of  this  kind  is  important  when 
simulations  are  expensive  and  only  a  small  sample  can  be  made  available.  The  burden  then  falls  on  the 
external  sources  to  provide  information  that  can  sharpen  performance  estimates  and  support  decision 
about  the  underlying  system. 

Electricity  load  forecast.  A  significant  portion  of  electricity  supply  in  the  United  States  is  provided 
through  “day-ahead”  contracts,  where  producers  agree  to  deliver  certain  quantities  of  electricity  for 
the  next  day.  A  central  component  of  a  market  place  for  such  contracts  is  forecasting  tools  for  the 
next  day’s  electricity  load  (demand).  Such  tools  must  necessarily  rely  on  the  information  available 
at  the  time  of  forecasting,  which  typically  includes  weather  forecasts  for  the  next  day  (temperature, 
dew  point,  cloud  cover,  etc.)  as  well  as  historical  information  about  electricity  load  on  days  that  had 
similar  weather  forecasts.  Figure  15(a)  shows  temperature  forecasts  for  five  days  in  Boston  in  2012, 
with  corresponding  actually  observed  electricity  loads  on  those  days  shown  in  Figure  15(b).  Although, 
the  historical  information  may  stretch  decades  back,  the  useful  information  is  much  more  limited  as  the 
data  needs  to  be  carefully  segmented  to  ensure  that  only  “similar”  days  are  included;  see  [13,  14].  For 
example,  data  for  Mondays  should  not  be  mixed  with  those  for  Saturdays  as  the  electricity  load  are 
fundamentally  different  on  the  weekend,  and  data  for  hot  days  should  not  be  mixed  with  those  of  cool 
days.  In  the  end,  only  about  15  to  25  days  per  year  may  remain  and  on  which  the  forecast  must  rely, 
making  traditional  techniques  based  on  time-series  or  stochastic  differential  equations  inaccurate.  Here, 
we  briefly  describe  the  epi-spline-based  construction  of  a  stochastic  process  of  the  next  day’s  electricity 
load  as  laid  out  in  [13,  14];  see  also  [16].  Epi-splines  enter  at  three  points.  First,  to  help  segment 
the  data,  probability  density  functions  are  estimated  using  an  approach  similar  to  that  described  for 
simulation  output  above.  Second,  a  regression  model  builds  an  estimate  of  the  “expected”  load  for  the 
full  24  hours  of  the  next  day.  Third,  conditional  probability  density  estimates  quantify  the  variability 
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(a) 


(b) 


Figure  15:  Temperature  forecasts  (a)  and  corresponding  actual  loads  (b)  for  five  days. 
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Figure  16:  Probability  densities  of  errors  during  two  hours  of  the  day. 
in  the  process.  We  here  describe  the  two  last  steps  only. 

Suppose  that  there  are  J  days  of  historical  data  on  which  to  base  the  estimates.  For  each  day  j  = 
1, 2, ...,  J,  and  hour  h  =  1,2, ...,  24,  of  that  day,  we  know  the  forecasted  temperature  t3h ,  the  forecasted 
dew  point  d3h,  and  the  actually  observed  load  lJh.  Additional  weather  statistics  are  also  available,  but 
excluded  here  for  simplicity  of  exposition.  In  any  case,  temperature  and  dew  point  forecasts  appear  to 
suffice  for  the  estimation  of  summer  days  [13,  14],  The  regression  model  then  takes  the  form 

i’„  =  +  sd-‘(/.)4  +  4. 

where  stmp  and  sdpt  are  epi-splines  defined  on  [0,  24]  and  e3h  is  the  error  between  the  observed  load 
fh  and  the  predicted  load  strap(h)t3h  +  sdpt(h)d3h.  We  note  that  here  the  “coefficients”  in  front  of  the 
explanatory  variables  “temperature”  and  “dew  point”  are  functions  of  time.  The  regression  problem  is 
then  to  determine  epi-splines  stmp  and  sdpt  that 

J  24 

minimize 

j= 1 h= 1 

with  the  external  information  that  the  epi-splines  are  of  second  order,  are  continuously  differentiable, 
and  have  bounded  second-order  derivatives.  Using  the  representation  of  polynomials  in  (2)  and  mesh- 
points  rrik  =  k2A/N ,  k  =  1,  2, ...,  N,  partitioning  [0,  24]  into  N  intervals,  we  obtain  that 

stmp(x)  =  aQ,tmp  +  4’tmpx  +  a2,tmpx2,  for  x  €  mk), 

with  continuity  forcing  the  epi-spline  on  the  mesh-points  to  be  defined  by  the  values  at  the  adjacent 
intervals;  sdpt  is  defined  similarly  in  terms  of  parameters  aQ,dpt,  a^’dpt,  and  a2,dpt.  Let  mo  =  0.  The 
regression  problem  then  takes  the  form  of  a  linear  program  over  those  parameters  as  well  as  over  the 
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Scenario  loads  for  date  2011-06-08  and  zone  CT 


Figure  17:  Regression  load,  with  estimated  uncertainty,  and  actual  load, 
auxiliary  parameters  zJh: 

J  24 

min  EE  z3h  such  that 
3=1  h=  1 


lj  - 

Lh 

/  /c.tmp 
■(«0 

+  ak’tmph  +  ak2’tmph2)t{  -  {ak’dpt 

+  ak’dpth  +  ak,dpth2)d3h  < 

zh' 

h 

€  (mfc 

-1: 

mk] 

(3) 

k  ~- 

=  1, 

..., 

AT,  j  = 

=  1, 

...,J 

-It  +  (aS'*"p 

+  ak,tmph  +  ak’tmph2)t{  +  (ak’dpt 

+  ak’dpth  +  ak’dpth2)di  < 

0 

/i 

€  (mk 

-1) 

mk] 

(4) 

k 

=  1, 

..., 

N,  j  = 

=  1, 

...,J 

/c.tmp 

«0 

+  af mp 

i  ,  /c.tmp  2  /  /c+l,tmp  ,  /c+l,tmp  ,  Zc+l,tmp  2\ 

mk  +  a2  mfc  —  (a0  +  a1  mk  +  a2  = 

o, 

k 

=  1,... 

,N 

-  1 

(5) 

/c, tmp  ,  o  fc,tmp  / 

ax  +  za2  rrik  ~  [a 

fc+qtmp  +  2ak+1'tmpmk)  = 

0, 

k 

=  1,- 

,N 

-  1 

(6) 

^  k,  dpt  k,  tmp  ^ 

t\i  2  7  02  — 

K, 

k 

=  1,- 

,N 

(7) 

The  inequalities  (3)  and  (4)  linearize  the  criterion  function,  (5)  and  (6)  ensure  continuity  and  continuous 
differentiability  requirements  (see  §3.3),  and  (7)  bounds  the  magnitude  of  the  second-order  derivatives 
to  k.  In  [14],  N  =  24  and  k  =  500.  A  slightly  more  complicated  version  with  weighted  errors  is  also 
possible. 

The  regression  curve  can  be  viewed  as  the  drift  of  the  stochastic  process  that  forecasts  the  electricity 


31 


load.  The  next  step  is  to  determine  the  volatility  of  the  process.  For  a  fixed  hour  h,  we  can  interpret 
the  minimized  errors  {e^}^=1  as  the  observations  associated  with  the  distribution  of  these  errors.  Using 
the  epi-spline-based  procedure  for  estimating  probability  densities  described  above,  it  is  then  possible 
to  estimate  this  distribution.  Since  the  number  of  observed  errors  will  be  limited,  it  becomes  especially 
critical  to  include  external  information  to  ensure  quality  estimates.  We  can  repeat  this  process  for  each 
hour  h  independently,  but  that  would  not  take  into  account  conditioning.  Clearly,  if  the  load  at  hour 
h  is  substantially  higher  than  the  load  tentatively  projected  by  the  regression  curve,  one  should  take 
this  into  account  when  looking  at  the  distribution  of  the  errors  at  a  later  time  h  +  Ah.  This  can  be 
done  systematically  by  restricting  the  samples  of  the  error  at  time  h  +  Ah  to  those  that  come  from 
(observed)  load  trajectories  that  at  time  h  had  similar  deviations  from  the  overall  drift  of  the  process, 
i.e.,  from  the  regression  curve;  see  [13,  14,  16]  for  details.  Since  this  limits  the  number  of  data  point  on 
which  a  single  density  estimate  must  rely,  the  significance  of  being  able  to  include  external  information 
in  the  epi-spline  framework  is  further  highlighted. 

Extensive  numerical  results  are  provided  by  [13,  14]  for  case  studies  covering  the  New  England  region 
of  the  United  States.  Figure  16  from  [13]  illustrates  two  epi-spline-based  estimates  of  the  probability 
density  function  of  errors  at  hours  h  =  6  and  17  on  a  particular  day.  For  a  day  with  high  variability  in 
electricity  load  taken  from  [14],  Figure  17  gives  an  example  of  estimated  loads  in  2011  in  Connecticut 
according  to  the  regression  model  (line  with  circles),  the  associated  uncertainty  given  in  terms  of 
alternative  scenarios  (lines  with  solid  dots),  and  the  actual  load  (line  with  squares).  It  is  clear  that  the 
constructed  stochastic  process  covers  the  actual  load  to  a  high  degree.  Table  1  taken  from  [14]  shows 
aggregated  forecasting  error  statistics  in  terms  of  mean  average  percent  forecasting  errors  from  actual 
loads  for  different  seasons  and  numbers  of  segments  used  in  the  procedure  for  segmenting  the  data 
into  clusters  of  “similar”  days.  We  find  that  regardless  of  the  season,  the  forecasting  errors  are  small, 
especially  under  an  intelligent  segmentation  of  the  data.  The  epi-spline-based  forecasting  method  of 
[13,  14]  is  currently  being  implemented  by  an  independent  software  provider,  with  planned  adoption  by 
the  New  England  Independent  System  Operator  to  support  daily  planning  in  2014. 


Season 

Numbers  of  segments 

1 

3 

5 

7 

Fall 

5.45 

4.66 

4.2 

3.99 

Spring 

3.1 

2.88 

2.67 

2.73 

Summer 

10.25 

4.82 

4.14 

4.19 

Winter 

5.25 

3.32 

3.29 

3.47 

Table  1:  Mean  average  percent  forecasting  error  in  electric  load  forecast. 
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